Data Visualization
10 January, 2024
You already…
Please install and load the following packages
Access lecture slide from the course landing page
I am Ayush.
I am a researcher working at the intersection of data, law, development and economics.
I teach Data Science using R at Gokhale Institute of Politics and Economics
I am a RStudio (Posit) certified tidyverse Instructor.
I am a Researcher at Oxford Poverty and Human development Initiative (OPHI), at the University of Oxford.
Reach me
ayush.ap58@gmail.com
ayush.patel@gipe.ac.in
ggplot2 takes a different approach to graphics than other plotting packages in RSome of the terminologies used in ggplot2:
data- what we want to visualize and consists of variablesGeoms - geometric objects that are drawn to represent the data, such as bars, lines, and pointsaesthetics - visual properties of geoms, such as x and y position, line color, point shapes, etcmappings from data values to aestheticsEffective design should start with a visual task analysis, determine the set of visual queries to be supported by a design, and then use color, form, and space to efficiently serve those queries. - Colin Ware
We will use mpg and diamonds dataset for learning data visualization You can run ?mpg and ?diamondsto understand the variables of the data
ggplot()An example - Plotting City Miles by Fuel Type
geom() & aes()theme()labs()ggplot(data = mpg) + # the plot area and data
geom_boxplot(
aes(fl, cty),
alpha = 0.5,
fill = "steelblue"
)+ # geom and aesthetic
geom_jitter(
aes(fl, cty),
alpha = 0.5,
colour = "steelblue"
)+ # another layer with aesthetics
theme_bw()+ # theme
labs(
x = "Fuel Type",
y = "City Miles (per gallon)",
title = "City Miles (per gallon) of Cars by Fuel Type"
) # labelsyrbss_samp dataset from openintro packageyrbss_samp data, plot a bar chart to see the distribution of males and femalesPlotting mean displacement by type of the car
facet_wrap() only uses a discrete variablempg %>%
group_by(class) %>%
summarise(mean_displacement = mean(displ)) %>%
ggplot(aes(x = reorder(class, mean_displacement), y = mean_displacement)) +
geom_col(width = .65, , fill = "#118B60")+
coord_flip()+
labs(title = "2 Seater Has the Highest Displacement",
subtitle = "Mean Displacement (in litres) vs Class of the Model",
y = "Mean Displacement",
x = "Type of the car",
caption = "Data Source : mpg | Analysis by Student")+
theme_bw()strength_training_variable into categories, with values between 0 as ‘no training’, 1-3 as ‘low training’, 3-5 as ‘moderate training’ and more than 5 as ‘high training’By default, R takes count as y variable
Useful for showing trend over time Using data tourism and drawing a line chart for unemployment over the years
A modification to the line chart
tourism <- openintro::tourism
tourism %>%
ggplot(aes(x= year, y = visitor_count_tho)) +
geom_area( fill="#69b3a2", alpha=0.4) + #to get the area below the graph
geom_line(group = 1, color = '#E54B4B', lwd = 1) +
ggtitle("Increase in Number of Tourists in Turkey") +
xlab("Year")+
ylab("Number of Visitors") +
theme_bw()gender variableunempl dataset and make a line chart of rate of unemployment over the years. Make sure that you properly format the graph as required